image colorization
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Prompt-based Consistent Video Colorization
Dani, Silvia, Uricchio, Tiberio, Seidenari, Lorenzo
Existing video colorization methods struggle with temporal flickering or demand extensive manual input. We propose a novel approach automating high-fidelity video colorization using rich semantic guidance derived from language and segmentation. We employ a language-conditioned diffusion model to colorize grayscale frames. Guidance is provided via automatically generated object masks and textual prompts; our primary automatic method uses a generic prompt, achieving state-of-the-art results without specific color input. Temporal stability is achieved by warping color information from previous frames using optical flow (RAFT); a correction step detects and fixes inconsistencies introduced by warping. Evaluations on standard benchmarks (DAVIS30, VIDEVO20) show our method achieves state-of-the-art performance in colorization accuracy (PSNR) and visual realism (Colorfulness, CDC), demonstrating the efficacy of automated prompt-based guidance for consistent video colorization.
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
- Asia > Middle East > Jordan (0.04)
Supplmentary Material: L-CAD: Language-based Colorization with Any-level Descriptions using Diffusion Priors
To demonstrate the effectiveness of our proposed luminance-guided image compression, semantic-aligned latent representation, and instance-aware sampling strategy (details in Sec. We demonstrate our generalization capability by showing more colorization results on legacy black-and-white photos in Figure 1, where results are presented sequentially from left to right using descriptions at the complete, partial, and scarce levels. Learning to color from language.
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Understanding SOAP from the Perspective of Gradient Whitening
Lu, Yanqing, Wang, Letao, Liu, Jinbo
Shampoo with Adam in the Preconditioner's eigenbasis (SOAP) has recently emerged as a promising optimization algorithm for neural network training, achieving superior training efficiency over both Adam and Shampoo in language modeling tasks. In this work, we analyze Adam, Shampoo, and SOAP from the perspective of gradient whitening, interpreting their preconditioners as approximations to the whitening matrix, which captures second-order curvature information. We further establish a theoretical equivalence between idealized versions of SOAP and Shampoo under the Kronecker product assumption. To empirically evaluate these insights, we reproduce the language modeling experiments using nanoGPT and grayscale image colorization. Our results show that SOAP exhibits similar convergence rate as Shampoo, and no significant advantage over both Adam and Shampoo in the final loss achieved, which aligns with their equivalence in theory.
Instance-aware Image Colorization with Controllable Textual Descriptions and Segmentation Masks
An, Yanru, Gui, Ling, Cai, Chunlei, Ye, Tianxiao, Yao, JIangchao, Zhai, Guangtao, Hu, Qiang, Zhang, Xiaoyun
Recently, the application of deep learning in image colorization has received widespread attention. The maturation of diffusion models has further advanced the development of image colorization models. However, current mainstream image colorization models still face issues such as color bleeding and color binding errors, and cannot colorize images at the instance level. In this paper, we propose a diffusion-based colorization method MT-Color to achieve precise instance-aware colorization with use-provided guidance. To tackle color bleeding issue, we design a pixel-level mask attention mechanism that integrates latent features and conditional gray image features through cross-attention. We use segmentation masks to construct cross-attention masks, preventing pixel information from exchanging between different instances. We also introduce an instance mask and text guidance module that extracts instance masks and text representations of each instance, which are then fused with latent features through self-attention, utilizing instance masks to form self-attention masks to prevent instance texts from guiding the colorization of other areas, thus mitigating color binding errors. Furthermore, we apply a multi-instance sampling strategy, which involves sampling each instance region separately and then fusing the results. Additionally, we have created a specialized dataset for instance-level colorization tasks, GPT-color, by leveraging large visual language models on existing image datasets. Qualitative and quantitative experiments show that our model and dataset outperform previous methods and datasets.
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Switzerland (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
MTSIC: Multi-stage Transformer-based GAN for Spectral Infrared Image Colorization
Liu, Tingting, Liu, Yuan, Tang, Jinhui, Yuan, Liyin, Liu, Chengyu, Li, Chunlai, Sui, Xiubao, Chen, Qian
--Thermal infrared (TIR) images, acquired through thermal radiation imaging, are unaffected by variations in lighting conditions and atmospheric haze. However, TIR images inherently lack color and texture information, limiting downstream tasks and potentially causing visual fatigue. Existing colorization methods primarily rely on single-band images with limited spectral information and insufficient feature extraction capabilities, which often result in image distortion and semantic ambiguity. In contrast, multiband infrared imagery provides richer spectral data, facilitating the preservation of finer details and enhancing semantic accuracy. In this paper, we propose a generative adversarial network (GAN)-based framework designed to integrate spectral information to enhance the colorization of infrared images. The framework employs a multi-stage spectral self-attention Transformer network (MTSIC) as the generator . Each spectral feature is treated as a token for self-attention computation, and a multi-head self-attention mechanism forms a spatial-spectral attention residual block (SARB), achieving multi-band feature mapping and reducing semantic confusion. Multiple SARB units are integrated into a Transformer-based single-stage network (STformer), which uses a U-shaped architecture to extract contextual information, combined with multi-scale wavelet blocks (MSWB) to align semantic information in the spatial-frequency dual domain. Multiple STformer modules are cascaded to form MTSIC, progressively optimizing the reconstruction quality. Experimental results demonstrate that the proposed method significantly outperforms traditional techniques and effectively enhances the visual quality of infrared images. Unlike visible-light images, TIR images are typically grayscale, lacking both color and fine texture details [2]. The human visual system can discern thousands of hues and intensities, but only around two dozen shades of gray [3]. Prolonged viewing of grayscale images can also lead to visual fatigue, further highlighting the necessity of colorization.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Zhejiang Province (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Convolutional Deep Colorization for Image Compression: A Color Grid Based Approach
Tassin, Ian, Goebel, Kristen, Lasher, Brittany
The search for image compression optimization techniques is a topic of constant interest both in and out of academic circles. One method that shows promise toward future improvements in this field is image colorization since image colorization algorithms can reduce the amount of color data that needs to be stored for an image. Our work focuses on optimizing a color grid based approach to fully-automated image color information retention with regard to convolutional colorization network architecture for the purposes of image compression. More generally, using a convolutional neural network for image re-colorization, we want to minimize the amount of color information that is stored while still being able to faithfully re-color images. Our results yielded a promising image compression ratio, while still allowing for successful image recolorization reaching high CSIM values.
- North America > United States > Oregon (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Transforming Color: A Novel Image Colorization Method
This paper introduces a novel method for image colorization that utilizes a color transformer and generative adversarial networks (GANs) to address the challenge of generating visually appealing colorized images. Conventional approaches often struggle with capturing long-range dependencies and producing realistic colorizations. The proposed method integrates a transformer architecture to capture global information and a GAN framework to improve visual quality. In this study, a color encoder that utilizes a random normal distribution to generate color features is applied. These features are then integrated with grayscale image features to enhance the overall representation of the images. Our method demonstrates superior performance compared with existing approaches by utilizing the capacity of the transformer, which can capture long-range dependencies and generate a realistic colorization of the GAN. Experimental results show that the proposed network significantly outperforms other state-of-the-art colorization techniques, highlighting its potential for image colorization. This research opens new possibilities for precise and visually compelling image colorization in domains such as digital restoration and historical image analysis.
- Europe > Austria > Vienna (0.14)
- North America > United States (0.04)
- Europe > Switzerland (0.04)
- Asia > South Korea > Gwangju > Gwangju (0.04)